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Abstract 

The conceptual component of this work is about "reference surfaces" which are the dual of reference frames 
often used for shape representation purposes. The theoretical component of this work involves the question 
of whether one can find a unique (and simple) mapping that aligns two arbitrary perspective views of an 
opaque textured quadric surface in 3D, given (i) few corresponding points in the two views, or (ii) the 
outline conic of the surface in one view (only) and few corresponding points in the two views. The practical 
component of this work is concerned with applying the theoretical results as tools for the task of achieving 
full correspondence between views of arbitrary objects. 
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1 Introduction 

This paper has three main goals, which in a way stand 
on their own. First, to support and extend the concept 
of a "reference surface" , which currently exists only in 
rudimentary form. Second, to introduce a natural appli- 
cation for more advanced reference surfaces for the pur- 
pose of achieving visual correspondence or registration 
across images (views) of 3D scenes. Third, to introduce 
new theoretical results on a specific class of reference sur- 
faces, the quadrics. The crux of this work is embedded 
in the third goal, yet we emphasize briefly the first two 
goals as they provide a context for one particular use of 
our results, and reasons for pushing further along these 
and similar lines (see also the discussion in Section 6). 

Reference surfaces are simply the dual of reference 
frames used for shape representation. An object in 3D 
space is represented relative to some frame. For example, 
if we model an object as a collection of points, then in 
affine space a minimal frame must consist of four points 
in general position; in projective space a minimal frame 
consists of five points in general position. In a dual man- 
ner, in affine space a reference plane is minimally nec- 
essary for shape representation; in projective space we 
have the tetrahedron of reference. Work along the lines 
of representing shape using minimal frame configurations 
and recovery from views can be found in [9, 8, 4, 27, 28], 
and in further references therein. 

As long as we use the minimal configuration of points 
for representing shape, there is no practical reason to 
distinguish between reference frames and reference sur- 
faces. The distinction becomes useful, as we shall see 
later, when we choose non-minimal frames; their dual 
corresponds to non-planar reference surfaces. Before we 
elaborate further on the duality between reference frames 
and reference surfaces, it would be useful to consider a 
specific application in which the notion of reference sur- 
faces appears explicitly. 

Consider the problem of achieving correspondence, or 
optical flow as it is known in the motion literature. The 
task is to recover the 2D displacement vector field be- 
tween points across two images, in particular in the case 
where the two images are two distinct views of some 3D 
object or scene. Typical applications for which often 
full correspondence (that is correspondence for all im- 
age points) is initially required include the measurement 
of motion, stereopsis, structure from motion, 3D recon- 
struction from point correspondences, and more recently, 
visual recognition, active vision and computer graphics 
animation. 

The concept of reference surfaces becomes relevant in 
this context when we consider the correspondence prob- 
lem as composed of two stages: (i) a nominal transfor- 
mation phase due to a reference surface, and (ii) recov- 
ery of a residual field (cf. [24, 25, 2]). In other words, 
we envision some virtual reference surface on the ob- 
ject and project the object onto that surface along the 
lines of sight associated with one of the views. As a re- 
sult, we have two surfaces, the original object and a vir- 
tual object (the reference surface). The correspondence 
field between the two views generated by the virtual sur- 
face can be characterized by a closed-form transforma- 



tion (the "nominal transformation"). The differences be- 
tween the corresponding points coming from the original 
surface and the corresponding points coming from the 
virtual surface are along epipolar lines and are small in 
regions where the reference surface lies close to the orig- 
inal surface. These remaining displacements are referred 
to as the "residual displacements" . The residuals are re- 
covered using instantaneous spatio-temporal derivatives 
of image intensity values along the epipolar lines (see 
Fig. 1). 

As a simple example, consider the case where the ref- 
erence surface is a plane. It is worth noting that pla- 
nar reference surfaces are also found in the context of 
navigation and obstacle detection in active vision ap- 
plications [29, 30, 13] as well as in infinitesimal motion 
models for visual reconstruction [7, 10, 23]. A planar 
reference surface corresponds to the dual case of shape 
representation under parallel projection (cf. [9]), or rela- 
tive affine structure under perspective projection [26, 28]. 
In other words, a nominal transformation is either a 2D 
affine transformation or a 2D projective transformation, 
depending on whether we assume an orthographic or 
perspective model of projection. The magnitude of the 
residual field is thus small in image regions that corre- 
spond to object points that are close to the reference 
plane, and the magnitude is large in regions that cor- 
respond to object points that are far away from the 
reference plane. This is demonstrated in Fig. 2. The 
top row displays show two views of a face obtained by 
rotation of the head approximately around the vertical 
axis of the neck. Three points were chosen (two eyes 
and the right mouth corner) for computing the nominal 
transformation. The overlay of the second view and the 
transformed first view demonstrate (bottom row) that 
the central region of the face is brought closer at the ex- 
pense of regions near the boundary, which correspond to 
object points that are far away from the virtual plane 
passing through both eyes and the mouth corner. 

This example naturally suggests that a nominal trans- 
formation based on placing a virtual quadric reference 
surface on the object would give rise to a smaller resid- 
ual field — for this particular class of objects. A quadric 
reference surface is a natural extension of the planar case 
and, as the example above demonstrates, may be a useful 
tool for the application of visual correspondence. 

In terms of duality between frames for shape repre- 
sentation and reference surfaces, the quadric reference 
frame will require a non-minimal configuration (of points 
and other forms). This configuration can also serve as a 
frame for shape representation, but the property we em- 
phasize here is the use of its dual — the quadric reference 
surface. 

The theoretical component of this work is therefore 
concerned with establishing a quadric reference surface 
from image information across two views. We start by 
addressing the following questions: First, given any two 
views of some unknown textured opaque quadric surface 
in 3D projective space V 3 , is there a finite number of 
corresponding points across the two views that uniquely 
determine all other correspondences coming from points 
on the quadric? Second, can the unique mapping be de- 
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Figure 1: Schematic illustration of the main concepts. The object is projected onto a virtual quadric along the line of sight. 
Points on the quadric are then projected onto the second image plane. The deviation of the object from the virtual quadric is 
a measure of shape (the quadric is a reference surface); the transformation T(p) due to the quadric is the "nominal quadratic 
transformation"; the displacement between p and T(p) is along epipolar lines and is called "residual displacement". This 
paper is about deriving (Theorems 1, 4) general methods for recovering T(p) given a small number of corresponding points 
across the two views (four points and the epipoles are sufficient). 
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Figure 2: The case of a planar reference surface, (a), (b) are images of two views of a face, first view ij)\ on the left and second 
view ij)2 on the right. Edges are superimposed on (a) for illustrative purposes, (c) overlayed edges of ij)\ and ip2- (d) The 
residual displacements (see text and Fig. 1) resulting from a planar reference surface. The planar nominal transformation is 
the 2D affine transformation determined by three corresponding points across the two eyes and the right mouth corner of the 
face. Notice that the displacements across the center region of the face are reduced, at the expense of the peripheral regions 
which are taken farther apart. 
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termined by the outline conic in one of the views (projec- 
tion of the rim) and a smaller number of corresponding 
points? A constructive answer to these questions readily 
suggests that we can associate a virtual quadric surface 
with any 3D object (not necessarily itself a quadric) and 
use it for describing shape, but more importantly, for 
achieving full correspondence between the two views. 

On the conceptual level we propose combining geo- 
metric constraints, captured from knowledge of a small 
number of corresponding points (manually given, for ex- 
ample), and photometric constraints captured by the in- 
stantaneous spatio-temporal changes in image brightness 
(conventional optical flow). The geometric constraints 
we propose are related to the virtual quadric surface 
mentioned above. These constraints lead to a trans- 
formation (a nominal quadratic transformation) that is 
applied to one of the views with the result of bringing 
both views closer together. The remaining displace- 
ments (residuals) are recovered either by optical flow 
techniques using the spatial and temporal derivatives of 
image brightness or by correlation of image patches. 

2 Notation 

We consider object space to be the 3D projective space 
V 3 , and image space to be the 2D projective space V 2 
— both over the field C of complex numbers. Views are 
denoted by t/> 8 -, indexed by i. The epipoles are denoted 
by v G ipi an d v' G t/"2, and we assume their locations are 
known (for methods, see [4, 5, 27, 28, 12], for example, 
and briefly later in the text). The symbol = denotes 
equality up to scale, GL n stands for the group of n x n 
matrices, PGL n is the group defined up to scale, and 
SPGL n is the symmetric specialization of PGL n . 

3 The Quadric Reference Surface I: 
Points 

We start with recovering the parameters of a quadric, 
modeled as a cloud of points, from two of its projections. 
The problem is straightforward if the two projection cen- 
ters are on the surface (Result 1). The general case (The- 
orem 1) is also made easier by resorting to projective 
reconstruction via a simple and convenient parameter- 
ization of space (Lemma 1). Guaranteeing uniqueness 
of the mapping between the two views of the quadric 
is somewhat challenging because the ray from a projec- 
tion center generally intersects the quadric at two points. 
This situation is disambiguated by combining an "opac- 
ity" assumption (Definition 1) with the parameterization 
used for recovering the quadric parameters (Lemma 2). 
Finally, as a byproduct of these derivations, one can 
readily obtain quantitative and simple measures relating 
the projection centers and the family of quadrics pass- 
ing through configurations of eight points (Theorem 2). 
This may have applications in the analysis of "critical 
surfaces" (Corollary 1). 

Result 1 Given two arbitrary views ipi,ip2 C V 2 of a 
quadric surface Q G V 3 with centers of projection at 
0,0' G V 3 , and 0,0' G Q, then five corresponding 
points across the two views uniquely determine all other 
correspondences. 



Proof. Let (xa,x-\_,X2) and (x' , x[, x' 2 ) be coordinates 
of t/>i and t/>2, respectively, and (20,..., 23) be coor- 
dinates of Q. Let O = (0,0,0,1); then the quadric 
surface may be given as the locus z^z^ — Z1Z2 = 0, 
and t/>i as the projection from O = (0,0,0,1) onto 
the plane zs = 0. In case where the centers of pro- 
jection are on Q, the line through O meets Q in ex- 
actly one other point, and thus the mapping i\>\ 1— ► 
Q is generically one-to-one, and so has a rational in- 
verse: (xq,xi,X2) h- ► (x 2 ,xoXi,xoX'2,xiX2). Because all 
quadric surfaces of the same rank are projectively equiv- 
alent, we can perform a similar blow-up from t/>2 with 
the result ^x'qjX'qX^jXqX^x^x^). The projective trans- 
formation A G PGL4 between the two representations of 
Q can then be recovered from five corresponding points 
between the two images. [1 

This result does not hold when the centers of projec- 
tion are not on the quadric surface. This is because 
the mapping between Q and V 2 is not one-to-one (a 
ray through the center of projection meets Q in two 
points), and therefore, a rational inverse does not exist. 
We are interested in establishing a more general result 
that applies when the centers of projection are not on 
the quadric surface. One way to enforce a one-to-one 
mapping is by making "opacity" assumptions, defined 
below. 

Definition 1 (Opacity Constraint) Given an object 
Q = {Pi,...,P n }, we assume there exists a plane 
through the camera center O that does not intersect any 
of the chords PiPj (i.e., Q is observed from only one 
"side" of the camera). Furthermore, we assume that the 
surface is opaque, which means that among all the sur- 
face points along a ray from O, the closest point to O 
is the point that also projects to the second view (ip2)- 
The first constraint, therefore, is a camera opacity as- 
sumption, and the second constraint is a surface opacity 
assumption — which together we call the opacity con- 
straint. 

With an appropriate parameterization of V 3 we can ob- 
tain the following result: 

Theorem 1 Given two arbitrary views ipi,ip2 C V 2 of 
an opaque quadric surface Q G V 3 ; then nine correspond- 
ing points across the two views uniquely determine all 
other correspondences. 

The following auxiliary propositions are used as part of 
the proof. 

Lemma 1 (Relative Affine Parameterization) 

Let p ,Pi,P2,P3 an d p' ,p'i,P2,P3 be four corresponding 
points coming from four non-coplanar points in space. 
Let A be a collmeation (homography) of V 2 determined 
by the equations Apj = p'-, j = 1,2,3, and Av = v' . 
Finally let v' be scaled such that p' = Ap + v' . Then, 
for any point P G V 3 projecting onto p and p' , we have 



P 



Ap + kv' . 



(1) 



The coefficient k = k(p) is independent of t/% «-e., is 
invariant to the choice of the second view, and the pro- 
jective coordinates of P are (x,y,\,k) T . 



The lemma, its proof and its theoretical and practi- 
cal implications are discussed in detail in [26, 28]. The 
scalar k is called a relative affine invariant and can be 
computed with the assistance of a second arbitrary view 
ip2- In a nutshell, a representation 1Z of P 3 is chosen 
such that the projection center O of the first camera po- 
sition is part of the reference frame (of five points). The 
matrix A is the 2D projective transformation due to the 
plane 7r passing through the object points Pi, P2, P3, i.e., 
for any P £ 7r we have p' = Ap. The representation 1Z is 
associated with [x,y, I, k] where k vanishes for all points 
coplanar with 7r, which means that 7r is the plane at 
infinity under the representation 1Z . Finally, the trans- 
formation between 1Z and the representation 1Z as seen 
from any other camera position (uncalibrated), can be 
described by an element of the affine group, i.e., the 
scalar k is an affine invariant relative to 1Z . 

Proof of Theorem: From Lemma 1, any point P can 
be represented by the coordinates (x,y, l,k) and k can be 
computed from Equation 1. Since Q is a quadric surface, 
there exists H £ SPGL 4 such that P T H P = 0, for all 
points P of the quadric. Because H is symmetric and 
determined up to scale, it contains only nine independent 
parameters. Therefore, given nine corresponding image 
points we can solve for H as a solution of a linear system; 
each corresponding pair p, p' provides one linear equation 
in H of the form (x, y, 1, k)H(x, y, 1, k) T = 0. 

Given that we have solved for H , the mapping i\>\ 1— ► 
t/>2 due to the quadric Q can be determined uniquely 
(i.e., for every p £ i\>\ we can find the corresponding 
p' £ 1P2) as follows. The equation P T HP = gives rise 
to a second order equation in k of the form ak 2 + b(p)k + 
c(p) = 0, where the coefficient a is constant (depends 
only on H) and the coefficients b, c depend also on the 
location of p. Therefore, we have two solutions for k, and 
by Equation 1, two solutions for p' . The two solutions for 
k are k 1 ,k 2 = ~\a > wnere r = Vb 2 — 4ac. The finding, 
shown in the next auxiliary lemma, is that if the surface 
Q is opaque, then the sign of r is fixed for all p £ i\>\. 
Therefore, the sign of r for p that leads to a positive 
root (recall that k = 1) determines the sign of r for all 
other p £ i\>\. Y\ 

Lemma 2 Given the opacity constraint, the sign of the 
term r = \/b 2 — Aac is fixed for all points p E ipi- 

Proof. Let P be a point on the quadric projecting onto 
p in the first image, and let the ray OP intersect the 
quadric at points P 1 , P 2 , and let k l ,k 2 be the roots of 
the quadratic equation ak 2 + b(p)k + c(p) = 0. The 
opacity assumption is that the intersection closer to O 
is the point projecting onto p and p' . 

Recall that P is a point (on the quadric in this case) 
used for setting the scale of v' (in Equation 1), i.e., 
k = 1. Therefore, all points that are on the same side 
of 7r as P have positive k associated with them, and 
vice versa (similar logic was used in [21] for convex- hull 
computations). There are two cases to be considered: 
either P is between O and 7r (i.e., O < P < tt), or 
7r is between O and P (i.e., O < it < P ) — that is 
O and P are on opposite sides of 7r. In the first case, 



if k l k 2 < then the non-negative root is closer to O, 
i.e., k = max(fc 1 , k 2 ). If both roots are negative, the 
one closer to zero is closer to O, again k = max(fc 1 , k 2 ). 
Finally, if both roots are positive, then the larger root 
is closer to O. Similarly, in the second case we have 
k = min(fc 1 ,A; 2 ) for all combinations. Because P can 
satisfy either of these two cases, the opacity assumption 
then gives rise to a consistency requirement in picking 
the right root: either the maximum root or the minimum 
root should be uniformly chosen for all points. [1 

In Section 5 we will show that Theorem 1 can be used 
to surround an arbitrary 3D surface by a virtual quadric, 
i.e., to create quadric reference surfaces, which in turn 
can be used to facilitate the correspondence problem be- 
tween two views of a general object. The remainder of 
this section takes Theorem 1 further to quantify certain 
useful relationships between the centers of two cameras 
and the family of quadrics that pass through arbitrary 
configurations of eight points whose projections on the 
two views are known. 

Theorem 2 Given a quadric surface Q C V 3 pro- 
jected onto views ipi,ip2 C V 2 , with centers of projection 
0,0' £ V 3 , there exists a parameterization of the image 
planes i\>\,i\>2 that yields a representation H £ SPGL4 
of Q such that /144 = when O £ Q, and the sum of the 
elements of H vanishes when O' £ Q. 

Proof. The re-parameterization described here was orig- 
inally introduced in [26] as part of the proof of Lemma 1. 
We first assign the standard coordinates in V 3 to three 
points on Q and to the two camera centers O and O' as 
follows. We assign the coordinates (1, 0, 0, 0), (0, 1, 0, 0), 
(0, 0, 1, 0) to Pi, P2, P3, respectively, and the coordinates 
(0, 0, 0, 1), (1, 1, 1, 1) to O, O' , respectively. By construc- 
tion, the point of intersection of the line OO' with 7r has 
the coordinates (1, 1, 1,0). 

Let P be some point on Q projecting onto p,p' . The 
line OP intersects 7r at the point (a, /?, 7, 0). The coordi- 
nates a,/3,j can be recovered (up to scale) by the map- 
ping t/>i h- ► 7r, as follows. Given the epipoles v and v' , we 
have by our choice of coordinates that P\,P2,P3 and v 
are projectively (in V 2 ) mapped onto e\ = (1, 0, 0), e2 = 
(0,l,0),e3 = (0,0,1) and e = (1,1,1), respectively. 
Therefore, there exists a unique element A\ £ PGL3 
that satisfies A\pj = e j , j = 1,2,3, and A\v = e. Let 
A\p = (a,/3, 7). Similarly, the line O' P intersects 7r at 
(a',/3',7',0). Let A 2 £ PGL 3 be defined by A 2 p' j = ej, 
j = 1,2,3, and A 2 v' = e. Let A 2 p' = (a',/3',y'). 

It is easy to see that A = A^ A\, where A is the 
collineation defined in Lemma 1. Likewise, the homoge- 
neous coordinates of P are transformed into (a, /?, 7, k). 
With this new coordinate representation the assump- 
tion O £ Q translates to the constraint that /144 = 
((0, 0, 0, l)i?(0, 0, 0, 1) T = 0), and the assumption O' £ 
Q translates to the constraint (1, 1, 1, l)_ff(l, 1,1, 1) T = 
0. Note also that h\\ = /122 = ^33 = due to the assign- 
ment of standard coordinates to P\, P2, P3. [] 

Corollary 1 Theorem 2 provides a quantitative mea- 
sure of the proximity of a set of eight 3D points, pro- 
jecting onto two views, to a quadric that contains both 
centers of projection. 



Proof. Given eight corresponding points we can solve 
for H with the constraint (1, 1, 1, l)iJ(l, 1, 1, 1) T = 0. 
This is possible since a unique quadric exists for any set 
of nine points in general position (the eight points and 
0'). The value of h AA is then indicative of how close the 
quadric is to the other center of projection O. V\ 

Note that when the camera center O is on the quadric, 
then the leading term of ak 2 + b(p)k + c(p) = vanishes 
(a = h AA = 0), and we are left with a linear function 
of k. We see that it is sufficient to have a bi-rational 
mapping between Q and only one of the views without 
employing the opacity constraint. This is because of the 
asymmetry introduced in our method: the parameters 
of Q are reconstructed with respect to the frame of ref- 
erence 1Z which includes the first camera center (i.e., 
relative affine reconstruction in the sense of [26]) rather 
than reconstructed with respect to a purely object-based 
frame (i.e., all five reference points coming from the ob- 
ject). Also note the importance of obtaining quantita- 
tive measures of proximity of an eight-point configura- 
tion of 3D points to a quadric that contains both centers 
of projection; this is a necessary condition for observing 
a "critical surface" . A sufficient condition is that the 
quadric is a hyperboloid of one sheet [6, 15]. Theorem 2 
provides, therefore, a tool for analyzing part of the ques- 
tion of how likely are typical imaging situations within 
a "critical volume" . 

4 The Quadric Reference Surface II: 
Conic + Points 

The previous section dealt with the problem of recover- 
ing a unique mapping between two views of an opaque 
quadric from point correspondences. Here we deal with 
a similar problem, but in addition to observing point 
correspondences, we observe the outline (the projection 
of the rim) of the quadric in one of the images. On the 
theoretical level, this case is challenging because we are 
not using the reconstruction paradigm as in Theorem 1, 
simply because we are observing the outline in one view 
only. In this case the opacity constraint, as manifested 
computationally in Lemma 2, plays a significant role at 
the level of recovering the quadric's parameters; whereas 
in the previous section the opacity constraint was used 
only for disambiguating the mapping between the two 
views given the quadric's parameters. On the practical 
level, this case provides significant advantages over the 
previous case of using point matches only (see later in 
Section 5). 

Theorem 3 (Outline Conic) Let H £ SPGL A repre- 
sent a quadric surface Q C V 3 , and compose H as 



H 



E 
h T 



h 

h AA 



(2) 



where E £ GL3 and symmetric. Let p = (x, y, 1) T be a 
point (in standard coordinate representation) in a view 
t/>i C V 2 of Q with projection center O = (0,0,0,1) T , 
then 

E' = hh T - h 44 E 

represents the outline conic (the projection of the rim) 
of Q in tpi . 



Proof. Let P = (x, y, 1, k) T be the coordinates of points 
on Q. We then obtain 

P T HP = p T Ep + 2h T kp + h AA k 2 = 0. 

The outline conic is defined by the border between the 
real and complex conjugate roots of k. Thus, the roots of 
k are the solution to the equation ak 2 + b(p)k + c(p) = 0, 
where 



a = 


- h AA 


b(p) = 


= 2h T p 


c(p) = 


= p Ep 



The condition for real roots, as is required for points 
coming from the quadric, is a non-negative discriminant 
A = b 2 - Aac > 0, or 

A = 4p T (hh T -h AA E)p> 0. 

Let E' = hh — h AA E. We see that the border between 
real and complex roots is a conic described by p T E'p = 
0.Q 

Theorem 4 (Outline conic and four corresponding points) 

Given two arbitrary views ipi,ip2 C V 2 of an opaque 
quadric surface Q £ V 3 , and the outline conic of Q in 
tpi, then four corresponding points across the two views 
uniquely determine all other correspondences. 

Proof. Let E' £ SPGL3 be the representation of the 
given outline conic of Q in i\>\ and let H be the rep- 
resentation (having the form (2)) of Q that we seek to 
recover. From Theorem 3 we have E' = hh T — h AA E. 
Note that if H is scaled by a, then E' is scaled by a 2 . 
Thus, given E' (with an arbitrary scale), we can hope 
to recover H at most up to a sign flip. What we need 
to show is that with four corresponding points, coming 
from a general configuration on Q, we can recover h and 
/J44 (up to the sign flip). Let P = (x, y, 1, k) T be a point 



on Q projecting onto p 


= (x, 


!/,l) T 


in i\)\. 


We then h 


ave 


h AA P T HP = (p,k) T ( 


hh T 

h AA 


-E' 


h AA h 

h 2 

"44 


)(0= 


0, 


which expands to 












(p T h + h AA k) 2 - 


~-pj T E 


'>4 


A 




or 












p h + h AA k 


1 

= ± 2 


Va. 







From Lemma 2, V A are either all positive or all nega- 
tive; therefore if we are given four points with their cor- 
responding k, we have exactly two solutions for (h, h AA ) 
(as a solution of a linear system). The two solutions are 
(h,h AA ) and —(h,h AA ) and we can choose one of them 
arbitrarily — since any H representing Q is only deter- 
mined up to scale. From Lemma 1, we can set k = for 
three of the four points, and k = 1 to the fourth point. 
Finally, after recovering H , the correspondence p' of any 
fifth point p can be uniquely determined (cf. Theorem 1). 

D 

We have, thus, a linear algorithm for obtaining H 
from an outline conic in one view (represented by E') 



and four corresponding points across both views. Note 
that the use of the opacity constraint, via Lemma 2, 
is less obvious than in the case of reconstruction from 
point correspondences only. In the case of points (Theo- 
rem 1) the opacity constraint was not needed for recov- 
ering H , simply because a quadric is uniquely defined by 
nine points and Lemma 1 provided a simple means for 
reconstructing the projective coordinates of those nine 
points. The opacity constraint is needed later only to 
determine which of the two possible intersection points 
of Q with the line of sight projects onto the second view. 
One could trivially extend the case of points to the case 
of conic and points by first reconstructing a conic of Q 
from two projections — a quadric is uniquely determined 
by a conic and four points. However, this is not what is 
done here. 

For practical reasons, it would not be desirable to rely 
on observing a conic section in both views as this would 
significantly reduce the generality of our results. In other 
words, the basic axiom that an object can be represented 
as a cloud of points would need to be restricted by the 
additional requirement that some of those points should 
lie on a conic section in space — not to mention that we 
would have to somehow identify which of the points lie 
on a conic section. 

As an alternative, we observe the projection of the 
rim in one of the views and derive the equations for re- 
constructing the quadric from its outline and four cor- 
responding points. In this case we do not have a conic 
of Q and four points, and thus it is not a priori clear 
that a unique reconstruction is possible. Indeed, The- 
orem 4 shows that without the opacity constraint we 
have at most eight solutions (16 modulo a common sign 
flip). This follows from the indeterminacy of whether 
the conic, projecting onto t/>i, is in front or behind (with 
respect to O) each of the four points. Since the conic 
in question is the rim, under the opacity assumption the 
rim is either behind all the points or in front of all the 
points. Since these two situations differ by a reflection, 
they correspond to the same quadric (i.e., H up to scale). 
Thus, the opacity constraint is used here twice — first 
to recover the quadric's representation H, and second 
to determine later (as in Theorem 1) which of the two 
intersections with the line of sight projects onto the sec- 
ond view t/>2- Finally, note that this could have worked 
only with the rim, and not with any other conic of Q — 
unless we observe it in both views. 

5 Application to Correspondence 

In the previous sections we developed the tools for recov- 
ering a unique mapping between two projections of an 
opaque quadric surface. In this section we derive an ap- 
plication of Theorems 1 and 4 to the problem of achiev- 
ing full correspondence between two grey-level images of 
a general 3D object. 

5.1 Algorithm Using Points Only 

For the task of visual correspondence the mapping be- 
tween two views of a quadric surface will constitute the 
"nominal quadratic transformation" which, in the case 



of points, can be formalized as a corollary of Theorem 1 
as follows: 

Corollary 2 (of Theorem 1) A virtual quadric sur- 
face can be fitted through any 3D surface, not necessarily 
a quadric surface, by observing nine corresponding points 
across two views of the object. 

Proof. It is known that there is a unique quadric sur- 
face through any nine points in general position. This 
follows from a Veronese map of degree two, i>2 : V" > 



•p(n + l)(n+2)/2-l ; defined by (^ ...,£„),_,.(. 



•). 



where x ranges over all monomials of degree two in 
xo, . . . , x n . For n = 3, this is a mapping from V 3 to V 9 
taking hypersurfaces of degree two in V 3 (i.e., quadric 
surfaces) into hyperplane sections of V 9 . Thus, the sub- 
set of quadric surfaces passing through a given point in 
V 3 is a hyperplane in V 9 , and since any nine hyperplanes 
in V 9 must have a common intersection, there exists a 
quadric surface through any given nine points. If the 
points are in general position this quadric is smooth (i.e., 
H is of full rank). 

Therefore, by selecting any nine corresponding points 
(barring singular configurations) across the two views 
we can apply the construction described in Theorem 1 
and represent the displacement between corresponding 
points p and p' across the two views as follows: 

p'c*(Ap+k q v') + k r v', (3) 



where k = k q + k r 



Moreover, k q is the relative affine 



structure of the virtual quadric and k r is the remaining 
parallax which we call the residual. The term within 
parentheses is the nominal quadratic transformation, 
and the remaining term k r v' is the unknown displace- 
ment along the known direction of the epipolar line. 
Therefore, Equation 3 is the result of representing the 
relative affine structure of a 3D object with respect to 
some reference quadric surface, namely, k r is a relative 
affine invariant (because k and k q are both invariants by 
Lemma 1). [1 

Note that the corollary is analogous to describing 
shape with respect to a reference plane [9, 26, 28] — in- 
stead of a plane we use a quadric and use the tools result- 
ing from Theorem 1 in order to establish a quadric ref- 
erence surface. The overall algorithm for achieving full 
correspondence given nine corresponding points Pj,p'j, 
j = 0, 1, . . . , 8, is summarized below: 

1. Determine the epipoles v,v'. This can be done us- 
ing eight corresponding points to first determine 
the "fundamental" matrix F satisfying p'J Fpj = 0, 
j = 1,...,8. The epipoles follow by Fv = and 
F T v' = (cf. [4, 5, 27, 28]). 

2. Recover the homography A from the equations 
A Pj = p'j, j = 1,2,3, and Av = v 1 [27, 21]. This 
leads to a linear system of eight equations for solving 
for A up to a scale. Scale v' to satisfy p' = Ap + v' . 

3. Compute kj, j = 4, . . . , 8 from the equation p'- = 
Apj + kjv' . A least-squares solution is given by the 
following formula: 

(p'j x v') T (Apj xp'A 



Pj x v 
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4. Compute the quadric parameters from the nine 



equations 




(4) 



fori = 0,1, 
k 3 = 0. 



, 8. Note that k = 1 and k\ = k'l 



5. For every other point p compute k q as the ap- 
propriate root of k of ak 2 + b(p)k + c(p) = 
0, where the coefficients a,b,c follow from 
(x q ,y q ,l,k q )H(x q ,y q ,l,k q ) T = 0, and the appro- 
priate root follows from the sign of r for ak 2 + 
b(p )k + c(p ) = consistent with the root k = 1. 

6. Warp t/>i according to the nominal transformation 

p = Ap + k q v' . 

Thus, the image brightness at any p £ i\>\ is copied 
onto the transformed location p. 

7. The remaining displacement (residual) between p' 
and p consists of an unknown displacement k r along 
the known epipolar line: 

p' = p + k r v' . 

The spatio-temporal derivatives of image brightness 
can be used to recover k r . 

This algorithm was implemented and applied to the 
pair of images displayed in the top row of Fig. 2. Note 
that typical displacements between corresponding points 
around the center region of the face vary around 20 pix- 
els. Achieving full correspondence between two views of 
a face is challenging for two reasons. First, a face is a 
complex object which is not easily parameterized. Sec- 
ond, the texture of a typical face does not contain enough 
image structure to obtain point-to-point correspondence 
in a reliable manner. However, there are a few points 
(on the order of 10-20) that can be reliably matched, 
such as the corners of the eye, mouth and eyebrows. We 
rely on these few points to recover the epipolar geometry 
and the nominal quadratic transformation. 

Fig. 3 displays the results in the following manner. 
The top row display shows the original second view. 
Notice that the transformed first view (middle row dis- 
play) appears to be heading in the right direction but is 
slightly deformed. The selection of corresponding points 
(selected manually) yielded an ellipsoid whose outline 
on the first view circumscribes the image of the head 
(this is not a general phenomenon; see later in this sec- 
tion). The overlay between the edges of the original sec- 
ond view and the edges of the transformed view are also 
shown in the middle row display. Notice that the resid- 
uals are relatively small, typically in the range of 1-2 
pixels. The residuals are subsequently recovered by us- 
ing a coarse-to-fine gradient-based optical flow method 
following [11, 3] constrained along epipolar directions (cf. 
[24]). The final results are shown in the bottom row dis- 
play. 

Also, a tight fit of a quadric surface onto the object 
can be obtained by using many corresponding points to 



obtain a least-squares solution for H . Note that from a 
practical point of view we would like the quadric to lie as 
close as possible to the object; otherwise the algorithm, 
though correct, would not be useful, i.e., the residuals 
may be larger than the original displacements between 
the two views. In this regard, the re-parameterization 
suggested in Theorem 2 may provide a better fit for least- 
squares methods. Using the parameterization described 
in the theorem, the entries /in, /122, ^33 vanish, leaving 
only six parameters of H to be determined (see proof of 
Theorem 2). Thus, instead of recovering nine parameters 
in a least-squares solution, we solve for only six param- 
eters, which is equivalent to constraining the resulting 
quadric to lie on three object points. The implementa- 
tion steps described above should be modified in a way 
that readily follows from the proof of Theorem 2. 

We have seen that the quadric's outline in the example 
shown in Fig. 3 circumscribes the image of the object. 
This, however, is not a general property and the issue 
is taken further in the next section where the results of 
Theorem 4 become relevant and practical. 

5.2 Algorithm Using Conic and Points 

When only point matches are used, one cannot guar- 
antee that the outline of the recovered quadric will cir- 
cumscribe the image of the object. Some choice of corre- 
sponding points may give rise to a quadric whose outline 
happens to falls within the image of the object. Fig. 4 
illustrates this possibility on a different face-pair. One 
can see that the outline of the quadric (again an ellip- 
soid) encompasses all sample points, but inscribes the 
image of the head, leaving out the peripheral region. 

In general, points p outside the outline correspond to 
rays OP that do not intersect the quadric in real space, 
and therefore the corresponding k q are complex conju- 
gate (i.e., the nominal quadratic transformation cannot 
be applied to p). This is where Theorem 4 becomes 
useful in practice. We have shown there that instead 
of nine corresponding points, the outline of the quadric 
and four corresponding points are sufficient for uniquely 
determining the mapping between the two views due to 
the quadric. In the context of visual correspondence, 
the outline conic can be set arbitrarily (such as circum- 
scribing the image of the object of interest), and the 
rest follows from Theorem 4. This is formalized in the 
following corollary: 

Corollary 3 (of Theorem 4) A virtual quadric sur- 
face lying on four object points projecting onto an ar- 
bitrary outline (conic) can be fitted through any 3D sur- 
face, not necessarily a quadric surface, by observing the 
corresponding four point matches across two views of the 
object. 

The algorithm for recovering a virtual quadric refer- 
ence surface, by setting an arbitrary conic in the first 
view, is summarized below. We are given four corre- 
sponding points Pj,p'j, j = 0,1,2,3 and the epipoles 
v, v' . The homography A due to the plane of reference 
passing through Pj , j = 1,2,3, is recovered as before 
(steps 1 and 2 of the point-based algorithm). The rest 
goes as follows: 




(a) 





(b) 



(c) 





(d) 



(e) 



Figure 3: Nominal quadratic transformation from nine corresponding points and subsequent refinement of residual displace- 
ments using optical flow, (a) Original second view ip2 (the first view, ipi, is shown in Fig. 2). Nine corresponding points were 
manually chosen. Fhe needle heads mark the positions of the sampling points in ip2 and the needles denote the corresponding 
displacement vectors, (b) Fhe view ij)\ warped using the nominal quadratic transformation, (c) Fhe residual displacement 
shown by overlaying the edges of (a) and (b). Note that the typical displacements are within 1-2 pixels (the original displace- 
ments were in the range of 20 pixels; see Fig. 2). (d) Fhe image in (b) warped further by applying optic flow along epipolar 
lines towards ip2- (e) Fhe performance of the correspondence strategy (nominal transformation followed by optical flow) is 
illustrated by overlaying the edges of (a) and (d). Note that correspondence has been achieved to within subpixel accuracy 
almost everywhere. 





(b) 




(c) 



(d) 





(e) 



(f) 



Figure 4: Nominal quadratic transformation from nine corresponding points: a case where the quadric's outline inscribes 
the image of the object, (a) ipi, (b) ip2 with overlayed corresponding points used for recovering the nominal quadratic 
transformation, (c) The recovered quadric (values of k q ). The uniform grey background indicates complex conjugate values 
for the roots, (d) The overlayed edges of ij)\ and ip2 masked by the ellipse having real roots, (e) The masked region of the 
transformed first view, (f) The overlayed edges of ip2 (b) and the transformed view (e). Note that within the masked area of 
real fc q -values, the residuals are fairly small. 





(a) 



(b) 



Figure 5: Nominal quadratic transformation recovered from a conic and four corresponding points, (a) original view ip2 with 
corresponding points overlayed. (b) The transformed first view, ipi, within the given conic (circle around the face in this 
example). 
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Figure 6: The nominal quadric transformation due to a hy- 
perboloid of two sheets. This unintuitive solution due to a 
deliberately unsuccessful choice of sample points creates the 
mirror image on the right side that is due to the second sheet 
of the hyperboloid. 



1. Select an arbitrary conic p T E'p = (presumably 
one that circumscribes the image of the object in 
the first view). 

2. Solve for vector h and scalar h 44 from the system 



pjh + h 44 kj = yJpjE'pj, 



.7=0,1,2,3. Note that k = 1 and k\ = A; 2 = ks = 
0. 

3. The parameter matrix H representing the quadric 
Q is given by 



H 



hh T - E' h 44 h 
h 44 h h 44 



The remaining steps are the same as steps 5,6, and 7 in 
the point-based algorithm. 

This algorithm was also implemented and applied to 
the pair of images used earlier (top row of Fig. 2). The 
arbitrary conic was chosen to be a circle circumscribing 
the image of the head in one view. Fig. 5 shows the 
original second view and the warped first view according 
to the recovered quadratic nominal transformation due 
to the conic and only four corresponding points. 

Finally, although ellipsoids and paraboloids are the 
most natural quadric surfaces for this application, we 
cannot (in principle) eliminate other classes of quadrics 
from appearing in this framework. For example, a hyper- 
boloid of two sheets may yield unintuitive results, under 
specialized circumstances (see Fig. 6). Since the recov- 
ered quadrics are in real space, a certain limited classi- 
fication is possible (based on the ranks of the matrices 
and the sign pattern of the eigenvalues of H), but un- 
fortunately that classification is not sufficient to elimi- 
nate hyperboloids of two sheets. In practice, however, 
the situation illustrated in Fig. 6 is accidental and was 
contrived for purposes of illustrating this kind of failure 
mode. 

6 Discussion 

The theoretical part of this paper addressed the ques- 
tion of establishing a one-to-one mapping between two 
views of an unknown quadric surface. We have shown 
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that nine corresponding points are sufficient to obtain a 
unique map, provided we make the assumption that the 
surface is opaque. Similarly, four corresponding points 
and the outline conic of the quadric in one view are suf- 
ficient to obtain a unique map as well. We have also 
shown that an appropriate parameterization of the im- 
age planes facilitates certain questions of interest such 
as the likelihood that eight corresponding points will be 
coming from a quadric lying in the vicinity of both cen- 
ters of projection. 

On the practical side, we have shown that the tools 
developed for quadrics can be applied to any 3D ob- 
ject by setting up a virtual quadric surface lying in the 
vicinity of the object. The quadric serves as a reference 
surface, but also facilitates the correspondence problem. 
For example, given the epipoles (which can be recov- 
ered independently), by specifying a conic circumscrib- 
ing the image of an object in one view and observing 
four corresponding points with the other view one can 
obtain the virtual quadric surface whose rim projects to 
the specified outline conic and which lies on the four 
corresponding object points in 3D space. The virtual 
quadric induces a unique mapping between the two views 
(the nominal quadratic transformation), which is equiv- 
alent to projecting the object onto the quadric along 
the projection lines toward the first view, followed by a 
projection of the quadric onto the second view. What 
remains are residual displacements along epipolar lines 
whose magnitude are small in regions where the object 
lies close to the virtual quadric. The residual displace- 
ments are later refined by use of local spatio-temporal 
detectors that implement the constant brightness equa- 
tion, or any correlation scheme (cf. [14, 22, 1]), along 
the epipolar lines. In the implementation section we have 
shown that two views of a face with typical displacements 
of around 20 pixels are brought closer to displacements 
of around 1-2 pixels by the transformation. Most opti- 
cal flow methods can deal with such small displacements 
quite effectively. 

On the conceptual level, two proposals were made. 
First, the correspondence problem is treated as a two- 
stage process combining geometric information captured 
by a small number of point matches, and photometric 
information captured by the spatio-temporal derivatives 
of image brightness. Second, manipulations on 3D ob- 
ject space are achieved by first manipulating a reference 
surface. The reference surface is viewed here as an ap- 
proximate prototype of the observed object, and shape is 
measured relative to the prototype rather than relative 
to a generic (minimal) frame of reference. 

The notion of reference surfaces as prototypes may be 
relevant for visual recognition, visual motion and stere- 
opsis. In some of these areas one may find some support 
to this notion in the human vision literature, although 
not directly. For example, the phenomenon of "motion 
capture" introduced by Ramachandran [18, 19, 20] is 
suggestive of the kind of motion measurement presented 
here. Ramachandran and his collaborators observed that 
the motion of certain salient image features (such as grat- 
ings or illusory squares) tends to dominate the perceived 
motion in the enclosed area by masking incoherent mo- 



tion signals derived from uncorrelated random dot pat- 
terns, in a winner-take-all fashion. Ramachandran there- 
fore suggested that motion is computed by using salient 
features that are matched unambiguously and that the 
visual system assumes that the incoherent signals have 
moved together with those salient features [18]. The 
scheme suggested in this paper may be considered as 
a refinement of this idea. Motion is "captured" in Ra- 
machandran 's sense by the reference surface, not by as- 
suming the motion of the salient features but by com- 
puting the nominal motion transformation. The nominal 
motion is only a first approximation which is further re- 
fined by use of spatio-temporal detectors, provided that 
the remaining residual displacement is in their range, 
namely, the object being tracked and the reference sur- 
face model are sufficiently close. In this view the effect of 
capture attenuates with increasing depth of points from 
the reference surface, and is not affected, in principle, 
by the proximity of points to the salient features in the 
image plane. 

Other suggestive data include stereoscopic interpola- 
tion experiments by Mitchison and McKee [16]. They 
describe a stereogram which has a central periodic re- 
gion bounded by unambiguously matched edges. Under 
certain conditions the edges impose one of the expected 
discrete matchings (similar to stereoscopic capture; see 
also [17]). Under other conditions a linear interpolation 
in depth occurrs between the edges violating any possible 
point-to-point match between the periodic regions. The 
linear interpolation in depth corresponds to a plane pass- 
ing through the unambiguously matched points, which 
supports the idea that correspondence starts with the 
computation of nominal motion (in this case due to a 
planar reference surface), determined by a small number 
of salient unambiguously matched points, and is later 
refined using short-range mechanisms. 

To conclude, the computational results provide tools 
for further exploring the utility of reference surfaces in 
visual applications, and provide specific applications to 
the task of visual correspondence (visual motion). 
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